Search CORE

4 research outputs found

LFM-3D: Learnable Feature Matching Across Wide Baselines Using 3D Signals

Author: Araujo Andre
Karpur Arjun
Martin-Brualla Ricardo
Perrotta Guilherme
Zhou Howard
Publication venue
Publication date: 18/08/2023
Field of study

Finding localized correspondences across different images of the same object is crucial to understand its geometry. In recent years, this problem has seen remarkable progress with the advent of deep learning-based local image features and learnable matchers. Still, learnable matchers often underperform when there exists only small regions of co-visibility between image pairs (i.e. wide camera baselines). To address this problem, we leverage recent progress in coarse single-view geometry estimation methods. We propose LFM-3D, a Learnable Feature Matching framework that uses models based on graph neural networks and enhances their capabilities by integrating noisy, estimated 3D signals to boost correspondence estimation. When integrating 3D signals into the matcher model, we show that a suitable positional encoding is critical to effectively make use of the low-dimensional 3D information. We experiment with two different 3D signals - normalized object coordinates and monocular depth estimates - and evaluate our method on large-scale (synthetic and real) datasets containing object-centric image pairs across wide baselines. We observe strong feature matching improvements compared to 2D-only methods, with up to +6% total recall and +28% precision at fixed recall. Additionally, we demonstrate that the resulting improved correspondences lead to much higher relative posing accuracy for in-the-wild image pairs - up to 8.6% compared to the 2D-only approach

arXiv.org e-Print Archive

NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations

Author: Araujo André
Engelhardt Andreas
Ferrari Vittorio
Jampani Varun
Karpur Arjun
Li Yuanzhen
Liu Ce
Makadia Ameesh
Maninis Kevis-Kokitsi
Martin-Brualla Ricardo
Patel Kaushal
Popov Stefan
Sargent Kyle
Truong Karen
Vlasic Daniel
Zhou Howard
Publication venue
Publication date: 13/10/2023
Field of study

Recent advances in neural reconstruction enable high-quality 3D object reconstruction from casually captured image collections. Current techniques mostly analyze their progress on relatively simple image collections where Structure-from-Motion (SfM) techniques can provide ground-truth (GT) camera poses. We note that SfM techniques tend to fail on in-the-wild image collections such as image search results with varying backgrounds and illuminations. To enable systematic research progress on 3D reconstruction from casual image captures, we propose NAVI: a new dataset of category-agnostic image collections of objects with high-quality 3D scans along with per-image 2D-3D alignments providing near-perfect GT camera parameters. These 2D-3D alignments allow us to extract accurate derivative annotations such as dense pixel correspondences, depth and segmentation maps. We demonstrate the use of NAVI image collections on different problem settings and show that NAVI enables more thorough evaluations that were not possible with existing datasets. We believe NAVI is beneficial for systematic research progress on 3D reconstruction and correspondence estimation. Project page: https://navidataset.github.ioComment: NeurIPS 2023 camera ready. Project page: https://navidataset.github.i

arXiv.org e-Print Archive

Global Features are All You Need for Image Retrieval and Reranking

Author: Araujo Andre
Cao Bingyi
Chen Kaifeng
Cui Qinghua
Karpur Arjun
Shao Shihao
Publication venue
Publication date: 14/08/2023
Field of study

Utilizing a two-stage paradigm comprising of coarse image retrieval and precise reranking, a well-established image retrieval system is formed. It has been widely accepted for long time that local feature is imperative to the subsequent stage - reranking, but this requires sizeable storage and computing capacities. We, for the first time, propose an image retrieval paradigm leveraging global feature only to enable accurate and lightweight image retrieval for both coarse retrieval and reranking, thus the name - SuperGlobal. It consists of several plug-in modules that can be easily integrated into an already trained model, for both coarse retrieval and reranking stage. This series of approaches is inspired by the investigation into Generalized Mean (GeM) Pooling. Possessing these tools, we strive to defy the notion that local feature is essential for a high-performance image retrieval paradigm. Extensive experiments demonstrate substantial improvements compared to the state of the art in standard benchmarks. Notably, on the Revisited Oxford (ROxford)+1M Hard dataset, our single-stage results improve by 8.2% absolute, while our two-stage version gain reaches 3.7% with a strong 7568X speedup. Furthermore, when the full SuperGlobal is compared with the current single-stage state-of-the-art method, we achieve roughly 17% improvement with a minimal 0.005% time overhead. Code: https://github.com/ShihaoShao-GH/SuperGlobal.Comment: Accepted to ICCV 202

arXiv.org e-Print Archive